PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation
نویسندگان
چکیده
DNA-binding proteins play crucial roles in various biological processes, such as DNA replication and repair, transcriptional regulation and many other biological activities associated with DNA. Experimental recognition techniques for DNA-binding proteins identification are both time consuming and expensive. Effective methods for identifying these proteins only based on protein sequences are highly required. The key for sequence-based methods is to effectively represent protein sequences. It has been reported by various previous studies that evolutionary information is crucial for DNA-binding protein identification. In this study, we employed four methods to extract the evolutionary information from Position Specific Frequency Matrix (PSFM), including Residue Probing Transformation (RPT), Evolutionary Difference Transformation (EDT), Distance-Bigram Transformation (DBT), and Trigram Transformation (TT). The PSFMs were converted into fixed length feature vectors by these four methods, and then respectively combined with Support Vector Machines (SVMs); four predictors for identifying these proteins were constructed, including PSFM-RPT, PSFM-EDT, PSFM-DBT, and PSFM-TT. Experimental results on a widely used benchmark dataset PDB1075 and an independent dataset PDB186 showed that these four methods achieved state-of-the-art-performance, and PSFM-DBT outperformed other existing methods in this field. For practical applications, a user-friendly webserver of PSFM-DBT was established, which is available at http://bioinformatics.hitsz.edu.cn/PSFM-DBT/.
منابع مشابه
Moitf GibbsGA: Sampling Transcription Factor Binding Sites Coupled with PSFM Optimization by Genetic Algorithm
Identification of transcription factor binding sites (TFBSs) or motifs plays an important role in deciphering the mechanisms of gene regulation. Although many experimental and computational methods have been developed, finding TFBSs remains a challenging problem. We propose and develop a novel sampling based motif finding method coupled with PSFM optimization by genetic algorithm, which we call...
متن کاملIdentifying short motifs by means of extreme value analysis
The problem of detecting a binding site – a substring of DNA where transcription factors attach – on a long DNA sequence requires the recognition of a small pattern in a large background. For short binding sites, the matching probability can display large fluctuations from one putative binding site to another. Here we use a self-consistent statistical procedure that accounts correctly for the l...
متن کاملGNBSL: a new integrative system to predict the subcellular location for Gram-negative bacteria proteins.
This paper proposes a new integrative system (GNBSL--Gram-negative bacteria subcellular localization) for subcellular localization specifized on the Gram-negative bacteria proteins. First, the system generates a position-specific frequency matrix (PSFM) and a position-specific scoring matrix (PSSM) for each protein sequence by searching the Swiss-Prot database. Then different features are extra...
متن کاملProtein Structural Class Prediction via k-Separated Bigrams Using Position Specific Scoring Matrix
Protein structural class prediction (SCP) is as important task in identifying protein tertiary structure and protein functions. In this study, we propose a feature extraction technique to predict secondary structures. The technique utilizes bigram (of adjacent and k-separated amino acids) information derived from Position Specific Scoring Matrix (PSSM). The technique has shown promising results...
متن کاملA novel three-stage distance-based consensus ranking method
In this study, we propose a three-stage weighted sum method for identifying the group ranks of alternatives. In the first stage, a rank matrix, similar to the cross-efficiency matrix, is obtained by computing the individual rank position of each alternative based on importance weights. In the second stage, a secondary goal is defined to limit the vector of weights since the vector of weights ob...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 18 شماره
صفحات -
تاریخ انتشار 2017